Term Clusters Evaluation by Montecarlo Sampling

نویسنده

  • Nicolas Turenne
چکیده

Huge amount of textual information available in firms and institutions triggers the need for robust textual data analysis systems. A new field called text-mining has the goal of discovering hidden information and knowledge structuring in texts. Statistical methods coupled with natural language processing can give some answers to this kind of problems. We have developed a module of term clustering called Galex (Graph Analyzer for LEXicometry). This paper considers random corpora used to compare homogeneity parameters (precision, recall, extraction probability from a set of categories) with clusters obtained from a real corpus and a hand-made hierarchy related to the domain of the corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generic BRDF Sampling - A Sampling Method for Global Illumination

This paper introduces a new BRDF sampling method with reduced variance, which is based on a hierarchical adaptive parameterless PDF. This PDF is based also on rejection sampling with a bounded average number of trials, even in regions where the BRDF does exhibit high variations. Our algorithm works in an appropiate way with both physical and analytical reflectance models. Reflected directions a...

متن کامل

An Importance Sampling Method for Arbitrary BRDFs

This paper introduces a new BRDF sampling method with reduced variance, which is based on a hierarchical adaptive PDF. This PDF also is based on rejection sampling with a bounded average number of trials, even in regions where the BRDF exhibits high variations. Our algorithm works in an appropiate way with both physical, analytical and measured reflectance models. Reflected directions are sampl...

متن کامل

Overelaxed hit-and-run Monte Carlo for the uniform sampling of convex bodies with applications in metabolic network analysis

The uniform sampling of convex regions in high dimension is an important computational issue, from both theoretical and applied point of view. The hit-and-run montecarlo algorithms are the most efficient methods known to perform it and one of their bottlenecks relies in the difficulty of escaping from tight corners in high dimension. Inspired by optimized montecarlo methods used in statistical ...

متن کامل

Iterative Turbo Decoding Using Gibbs Sampling

This paper discusses an iterative multiuser receiver for codedivision multiple access (CDMA) with forward error control coding. The receiver is derived from the maximum aposteriori (MAP) criterion for the joint received signal. A major drawback of the MAP receiver is its heavy computational cost that grows exponentially with the number of users. An alternative solution is proposed here based on...

متن کامل

Cost-Driven Multiple Importance Sampling for Monte-Carlo Rendering

The global illumination or transport problems can also be considered as a sequence of integrals, while its MonteCarlo solutions as different sampling techniques. Multiple importance sampling takes advantage of different sampling strategies and combines the results obtained with them. In this paper we propose the combination of very different global illumination algorithms in a way that their st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006